Goto

Collaborating Authors

 original game




Context informs pragmatic interpretation in vision-language models

Tan, Alvin Wei Ming, Prystawski, Ben, Boyce, Veronica, Frank, Michael C.

arXiv.org Artificial Intelligence

Iterated reference games - in which players repeatedly pick out novel referents using language - present a test case for agents' ability to perform context-sensitive pragmatic reasoning in multi-turn linguistic environments. We tested humans and vision-language models on trials from iterated reference games, varying the given context in terms of amount, order, and relevance. Without relevant context, models were above chance but substantially worse than humans. However, with relevant context, model performance increased dramatically over trials. Few-shot reference games with abstract referents remain a difficult task for machine learning models.


Efficient Last-Iterate Convergence in Regret Minimization via Adaptive Reward Transformation

Ren, Hang, Wu, Yulin, Qi, Shuhan, Zhang, Jiajia, Sun, Xiaozhen, Ma, Tianzi, Wang, Xuan

arXiv.org Artificial Intelligence

Regret minimization is a powerful method for finding Nash equilibria in Normal-Form Games (NFGs) and Extensive-Form Games (EFGs), but it typically guarantees convergence only for the average strategy. However, computing the average strategy requires significant computational resources or introduces additional errors, limiting its practical applicability. The Reward Transformation (RT) framework was introduced to regret minimization to achieve last-iterate convergence through reward function regularization. However, it faces practical challenges: its performance is highly sensitive to manually tuned parameters, which often deviate from theoretical convergence conditions, leading to slow convergence, oscillations, or stagnation in local optima. Inspired by previous work, we propose an adaptive technique to address these issues, ensuring better consistency between theoretical guarantees and practical performance for RT Regret Matching (RTRM), RT Counterfactual Regret Minimization (RTCFR), and their variants in solving NFGs and EFGs more effectively. Our adaptive methods dynamically adjust parameters, balancing exploration and exploitation while improving regret accumulation, ultimately enhancing asymptotic last-iterate convergence and achieving linear convergence. Experimental results demonstrate that our methods significantly accelerate convergence, outperforming state-of-the-art algorithms.




Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment

Wang, Mingzhi, Ma, Chengdong, Chen, Qizhi, Meng, Linjian, Han, Yang, Xiao, Jiancong, Zhang, Zhaowei, Huo, Jing, Su, Weijie J., Yang, Yaodong

arXiv.org Artificial Intelligence

Self-play methods have demonstrated remarkable success in enhancing model capabilities across various domains. In the context of Reinforcement Learning from Human Feedback (RLHF), self-play not only boosts Large Language Model (LLM) performance but also overcomes the limitations of traditional Bradley-Terry (BT) model assumptions by finding the Nash equilibrium (NE) of a preference-based, two-player constant-sum game. However, existing methods either guarantee only average-iterate convergence, incurring high storage and inference costs, or converge to the NE of a regularized game, failing to accurately reflect true human preferences. In this paper, we introduce Magnetic Preference Optimization (MPO), a novel approach capable of achieving last-iterate convergence to the NE of the original game, effectively overcoming the limitations of existing methods. Building upon Magnetic Mirror Descent (MMD), MPO attains a linear convergence rate, making it particularly suitable for fine-tuning LLMs. To ensure our algorithm is both theoretically sound and practically viable, we present a simple yet effective implementation that adapts the theoretical insights to the RLHF setting. Empirical results demonstrate that MPO can significantly enhance the performance of LLMs, highlighting the potential of self-play methods in alignment.


Reviews: A Unified Framework for Extensive-Form Game Abstraction with Bounds

Neural Information Processing Systems

This paper advances a line of work exploring how to approximate the Nash equilibrium of a game that's too large to compute directly. The idea is to create a smaller abstraction of the game by combining information sets, solve for equilibrium in the smaller game, then map the solution back to the original game. The topic relates to NIPS since this is a state-of-the-art method to program game-playing AI agents like poker bots. The authors prove new bounds on the error of the approximation that are very general. The authors provide the first general proof that an e'-Nash equilibrium in an abstraction leads to an e-Nash equilibrium in the original game.


Abstracting Imperfect Information Away from Two-Player Zero-Sum Games

Sokota, Samuel, D'Orazio, Ryan, Ling, Chun Kai, Wu, David J., Kolter, J. Zico, Brown, Noam

arXiv.org Artificial Intelligence

In their seminal work, Nayyar et al. (2013) showed that imperfect information can be abstracted away from common-payoff games by having players publicly announce their policies as they play. This insight underpins sound solvers and decision-time planning algorithms for common-payoff games. Unfortunately, a naive application of the same insight to two-player zero-sum games fails because Nash equilibria of the game with public policy announcements may not correspond to Nash equilibria of the original game. As a consequence, existing sound decision-time planning algorithms require complicated additional mechanisms that have unappealing properties. The main contribution of this work is showing that certain regularized equilibria do not possess the aforementioned non-correspondence problem -- thus, computing them can be treated as perfect-information problems. Because these regularized equilibria can be made arbitrarily close to Nash equilibria, our result opens the door to a new perspective to solving two-player zero-sum games and yields a simplified framework for decision-time planning in two-player zero-sum games, void of the unappealing properties that plague existing decision-time planning approaches.


VR Assassin's Creed, Stranger Things and Ghostbusters arrive on Meta Quest later this year

Engadget

Meta announced a slate of upcoming games today for its standalone VR headsets (including the upcoming Meta Quest 3). Apple is expected to enter the virtual headset space next week, so Meta is hoping to make a lasting impression with its lineup of upcoming VR titles from beloved franchises, including Assassin's Creed, Stranger Things, Ghostbusters and Attack on Titan -- along with some VR remakes of old-school classics. In addition to Asgard's Wrath 2, the most enticing game may be the one we know the least about. Although it was little more than a tease, Meta confirmed that Assassin's Creed Nexus VR isn't vaporware after all: The next VR installment in the long-running series will launch in the Meta Quest Store later this year. Unfortunately, further details must wait for its official reveal at Ubisoft Forward on June 12th.